Skip to content

[Opt](cloud) Add rate limit for BE to MS rpc#60344

Merged
gavinchou merged 3 commits intoapache:masterfrom
bobhan1:be-ms-rpc-rate-limit
May 9, 2026
Merged

[Opt](cloud) Add rate limit for BE to MS rpc#60344
gavinchou merged 3 commits intoapache:masterfrom
bobhan1:be-ms-rpc-rate-limit

Conversation

@bobhan1
Copy link
Copy Markdown
Contributor

@bobhan1 bobhan1 commented Jan 29, 2026

What problem does this PR solve?

Problem Summary:

This PR implements a two-layer MS (Meta Service) RPC rate limiting system for Doris cloud mode BE:

  1. Host-level rate limiting — Token-bucket based QPS limiter for all 21 MS RPC types, preventing a single BE from overwhelming the MS with burst traffic.
  2. Table-level adaptive backpressure — When MS returns MS_BUSY error code, BE dynamically identifies and throttles the top-k highest-QPS tables using a state machine, and automatically relaxes limits after the pressure subsides.

Part 1: BE Host-Level Rate Limiting

Problem

In cloud mode, all BE nodes send RPCs (get_tablet, prepare_rowset, commit_rowset, etc.) to a shared Meta Service. A single BE experiencing load spikes (e.g., large batch imports, compaction storms) can send excessive RPC traffic that overwhelms MS, degrading service for all BEs.

Solution

Introduce HostLevelMSRpcRateLimiters — a per-BE, per-RPC-type rate limiter using token bucket algorithm.

Architecture:

  • All 21 MS RPC types are enumerated in MetaServiceRPC enum (defined via X-macro for maintainability)
  • Each RPC type has an independent TokenBucketRateLimiterHolder with its own QPS limit
  • QPS limits are configured per CPU core: actual_qps = config_value × num_cores
  • Thread-safe design using atomic_shared_ptr<RpcRateLimiter> array for lock-free concurrent access during limit() calls
  • Each rate limiter includes a bvar::LatencyRecorder to monitor sleep durations caused by rate limiting

Configuration:

Config Default Description
enable_ms_rpc_host_level_rate_limit true Global enable/disable switch
ms_rpc_qps_default 100 Default per-core QPS for all RPCs
ms_rpc_qps_<rpc_name> -1 Per-RPC override (-1 = use default, 0 = disabled)

All QPS configs are mutable (DEFINE_mInt32), allowing runtime adjustment without restart. reset_all() re-reads configs and recreates rate limiters.

Integration:

Rate limiting is applied inside the retry_rpc() template function in cloud_meta_mgr.cpp, which wraps all MS RPC calls. The RpcRateLimitCtx struct carries the rate limiter reference. Rate limiting executes before each RPC attempt (including retries), with the call to apply_rate_limit() performing a bthread_usleep if the token bucket requires waiting.

New files:

  • be/src/cloud/cloud_ms_rpc_rate_limiters.h / .cpp
  • be/test/cloud/cloud_ms_rpc_rate_limiters_test.cpp

Part 2: BE Table-Level Adaptive Backpressure

Problem

Host-level rate limiting applies uniformly across all tables. When MS reports overload (MAX_QPS_LIMIT), it's often caused by a small number of high-traffic tables (e.g., tables with many concurrent stream load jobs). A uniform rate limit would unnecessarily penalize all tables, while the hot tables continue to dominate the RPC traffic.

Solution

Implement table-level adaptive throttling for load-related RPCs. When MS returns MAX_QPS_LIMIT, BE identifies the top-k highest-QPS tables and progressively reduces their QPS limits, while leaving other tables unaffected.

Scope: Only 5 load-related RPC types participate in table-level throttling:

  • PREPARE_ROWSET
  • COMMIT_ROWSET
  • UPDATE_TMP_ROWSET
  • UPDATE_PACKED_FILE_INFO
  • UPDATE_DELETE_BITMAP

Architecture (4 components with clear separation of concerns):

MS_BUSY signal (MAX_QPS_LIMIT)
    │
    ▼
┌─────────────────────────┐       ┌──────────────────────────┐
│  RpcThrottleCoordinator │──────▶│  RpcThrottleStateMachine │
│  (timing control)       │       │  (pure state logic)      │
│  - upgrade cooldown     │       │  - upgrade history stack │
│  - downgrade trigger    │       │  - limit calculation     │
└─────────────────────────┘       └──────────┬───────────────┘
                                             │ Actions
                                             ▼
┌─────────────────────────┐       ┌──────────────────────────┐
│  TableRpcQpsRegistry    │       │  TableRpcThrottler       │
│  (QPS statistics)       │       │  (limit enforcement)     │
│  - per-table bvar       │       │  - StrictQpsLimiter      │
│  - top-k query          │       │  - per (rpc, table)      │
└─────────────────────────┘       └──────────────────────────┘

Component details:

  1. TableRpcQpsRegistry — Tracks per-(rpc_type, table_id) QPS using bvar::PerSecond<bvar::Adder>. Supports efficient top-k query via min-heap. Configurable time window via ms_rpc_table_qps_window_sec (immutable, default 10s).

  2. RpcThrottleStateMachine — Pure state machine with no time awareness or side effects. Maintains upgrade history as a stack for clean rollback.

    • on_upgrade(snapshot): For each top-k table in the QPS snapshot, calculates new_limit = current_qps × ratio (first time) or current_limit × ratio (already limited), with a floor of ms_rpc_table_qps_limit_floor. Returns SET_LIMIT actions.
    • on_downgrade(): Pops the most recent upgrade from history. If the table had a prior limit, restores it (SET_LIMIT). If no prior limit, removes it (REMOVE_LIMIT).
  3. RpcThrottleCoordinator — Timing control layer using tick counts (1 tick = 1 ms).

    • report_ms_busy(): Returns true if enough ticks have passed since last upgrade (cooldown).
    • tick(n): Advances time by n ticks. Returns true if downgrade should trigger (no MS_BUSY for downgrade_after_ticks).
  4. TableRpcThrottler — Enforces QPS limits using StrictQpsLimiter (strict fixed-interval, no burst allowed). Each (rpc_type, table_id) pair has its own limiter. Returns the time point when the request may execute; the caller sleeps until then.

  5. MSBackpressureHandler — Orchestrator that wires all components together:

    • on_ms_busy(): Called when retry_rpc receives MAX_QPS_LIMIT. Consults coordinator for cooldown, builds QPS snapshot from registry, feeds to state machine, applies resulting actions to throttler.
    • before_rpc() / after_rpc(): Called around each load-related RPC for throttle enforcement and QPS recording.
    • Background tick thread: Runs every 1 second, advances coordinator by 1000 ticks. Triggers downgrade when enough time has passed without MS_BUSY.

Upgrade/Downgrade lifecycle example:

Time 0s:   MS returns MAX_QPS_LIMIT
           → Upgrade level 1: top-2 tables (A: 100 qps, B: 80 qps)
             → A limited to 50 qps, B limited to 40 qps

Time 2s:   MS returns MAX_QPS_LIMIT again (cooldown 5s not passed)
           → Skipped

Time 6s:   MS returns MAX_QPS_LIMIT (cooldown passed)
           → Upgrade level 2: top-2 tables now (A: 50 qps, C: 60 qps)
             → A limited to 25 qps, C limited to 30 qps

Time 11s:  No MS_BUSY for 5s
           → Downgrade: undo level 2
             → A restored to 50 qps, C limit removed

Time 16s:  No MS_BUSY for 5s
           → Downgrade: undo level 1
             → A limit removed, B limit removed

Configuration:

Config Default Mutable Description
enable_ms_backpressure_handling false Yes Global enable/disable switch
ms_rpc_table_qps_window_sec 3 No bvar time window for QPS calculation
ms_backpressure_upgrade_interval_ms 3000 Yes Minimum cooldown between upgrades
ms_backpressure_upgrade_top_k 2 Yes Number of top tables to throttle per upgrade
ms_backpressure_throttle_ratio 0.75 Yes QPS decay ratio on upgrade
ms_rpc_table_qps_limit_floor 1.0 Yes Minimum QPS limit (won't throttle below this)
ms_backpressure_downgrade_interval_ms 3000 Yes Time without MS_BUSY before downgrade

Observability (bvar metrics):

  • ms_rpc_backpressure_upgrade_count / _60s — Upgrade event counts
  • ms_rpc_backpressure_downgrade_count / _60s — Downgrade event counts
  • ms_rpc_backpressure_ms_busy_count / _60s — MS_BUSY signal counts
  • ms_rpc_backpressure_throttle_wait_<rpc_name> — Per-RPC-type throttle wait latency
  • ms_rpc_backpressure_throttled_tables_<rpc_name> — Number of throttled tables per RPC type

New files:

  • be/src/cloud/cloud_throttle_state_machine.h / .cpp
  • be/src/cloud/cloud_ms_backpressure_handler.h / .cpp
  • be/test/cloud/cloud_throttle_state_machine_test.cpp
  • be/test/cloud/cloud_ms_backpressure_handler_test.cpp

Also renamed (not part of the feature, cleanup):

  • common/cpp/s3_rate_limiter.h/.cppcommon/cpp/token_bucket_rate_limiter.h/.cpp (more general naming since it's now used beyond S3)

Part 3: System Table for Table-Level Throttler Observability

Problem

The table-level backpressure system operates transparently inside BE. When issues arise, users and DBAs have no way to inspect which tables are being throttled, what their QPS limits are, or what their current QPS is — beyond checking raw bvar metrics.

Solution

Add a new system table information_schema.backend_ms_rpc_table_throttlers that exposes the real-time state of the TableRpcThrottler on each BE. This table is a Backend-Partitioned Schema Table, meaning each BE reports its own throttling data, and queries are
distributed to all alive BEs and aggregated.

Schema:

Column Type Description
BE_ID BIGINT Backend ID
TABLE_ID BIGINT Table ID being throttled
RPC_TYPE VARCHAR(64) RPC type name (e.g., PREPARE_ROWSET, COMMIT_ROWSET)
QPS_LIMIT DOUBLE Current QPS limit enforced on this (table, rpc) pair
CURRENT_QPS DOUBLE Current observed QPS for this (table, rpc) pair

Usage examples:

-- View all currently throttled tables across all BEs
SELECT * FROM information_schema.backend_ms_rpc_table_throttlers;

-- View throttled tables on a specific BE
SELECT * FROM information_schema.backend_ms_rpc_table_throttlers WHERE BE_ID = 10001;

-- Find the most severely throttled tables
SELECT * FROM information_schema.backend_ms_rpc_table_throttlers ORDER BY QPS_LIMIT ASC;

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
      • Added host-level token-bucket rate limiting for all MS RPCs (enabled by default via enable_ms_rpc_host_level_rate_limit)
      • Added table-level adaptive backpressure handling triggered by MS MAX_QPS_LIMIT response (disabled by default via enable_ms_backpressure_handling)
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch 3 times, most recently from 28843ee to 6e33c81 Compare January 29, 2026 08:26
@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch 11 times, most recently from 82ed77c to a8239ac Compare February 11, 2026 11:25
@bobhan1 bobhan1 marked this pull request as ready for review February 11, 2026 11:26
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Feb 11, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

Cloud UT Coverage Report

Increment line coverage 9.68% (3/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.29% (1796/2265)
Line Coverage 64.86% (32023/49369)
Region Coverage 65.56% (15992/24394)
Branch Coverage 56.07% (8505/15168)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (10/10) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 30385 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bdd55d6cb3a5862cc5ec61e49dca8fe680e0e3aa, data reload: false

------ Round 1 ----------------------------------
q1	17679	4467	4272	4272
q2	2039	390	244	244
q3	10109	1308	749	749
q4	10207	793	319	319
q5	7551	2209	1958	1958
q6	205	184	150	150
q7	882	748	619	619
q8	9263	1420	1155	1155
q9	4769	4700	4617	4617
q10	6827	1937	1546	1546
q11	494	265	237	237
q12	345	383	229	229
q13	17782	4100	3220	3220
q14	231	233	223	223
q15	906	821	818	818
q16	701	679	604	604
q17	713	816	547	547
q18	7109	5985	5815	5815
q19	1106	1011	627	627
q20	513	503	393	393
q21	2575	1838	1794	1794
q22	327	289	249	249
Total cold run time: 102333 ms
Total hot run time: 30385 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4383	4324	4319	4319
q2	280	336	259	259
q3	2125	2720	2212	2212
q4	1368	1746	1307	1307
q5	4336	4218	4183	4183
q6	220	193	137	137
q7	1879	1830	1702	1702
q8	2673	2631	2489	2489
q9	7645	7429	7470	7429
q10	2810	3278	2678	2678
q11	501	439	433	433
q12	682	728	644	644
q13	3895	4764	3545	3545
q14	308	348	302	302
q15	849	810	807	807
q16	698	740	678	678
q17	1146	1339	1393	1339
q18	8283	7969	8028	7969
q19	887	884	859	859
q20	2104	2189	2065	2065
q21	4771	4677	4393	4393
q22	527	469	485	469
Total cold run time: 52370 ms
Total hot run time: 50218 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 28.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bdd55d6cb3a5862cc5ec61e49dca8fe680e0e3aa, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.05	0.05
query3	0.25	0.08	0.08
query4	1.61	0.12	0.11
query5	0.27	0.26	0.25
query6	1.16	0.68	0.66
query7	0.03	0.03	0.03
query8	0.06	0.04	0.04
query9	0.56	0.52	0.49
query10	0.55	0.54	0.54
query11	0.14	0.10	0.10
query12	0.14	0.10	0.10
query13	0.64	0.61	0.62
query14	1.06	1.06	1.05
query15	0.88	0.86	0.87
query16	0.41	0.39	0.40
query17	1.07	1.05	1.14
query18	0.22	0.22	0.21
query19	2.10	2.02	2.01
query20	0.02	0.02	0.01
query21	15.39	0.26	0.15
query22	5.43	0.06	0.05
query23	16.17	0.28	0.11
query24	1.49	0.34	0.23
query25	0.09	0.07	0.08
query26	0.15	0.14	0.15
query27	0.11	0.09	0.05
query28	3.59	1.17	0.97
query29	12.56	3.95	3.19
query30	0.27	0.14	0.12
query31	2.81	0.65	0.41
query32	3.25	0.58	0.50
query33	3.24	3.25	3.27
query34	16.04	5.41	4.73
query35	4.80	4.80	4.80
query36	0.65	0.50	0.48
query37	0.12	0.07	0.07
query38	0.08	0.04	0.05
query39	0.05	0.03	0.03
query40	0.22	0.16	0.14
query41	0.09	0.03	0.03
query42	0.05	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 98.02 s
Total hot run time: 28.3 s

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 61.31% (618/1008) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.83% (19544/36995)
Line Coverage 36.31% (182073/501480)
Region Coverage 32.72% (141422/432179)
Branch Coverage 33.70% (61194/181565)

@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch from bdd55d6 to b345ed1 Compare February 27, 2026 06:52
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Feb 27, 2026

run buildall

@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch 2 times, most recently from 93f86d5 to bb1452b Compare February 27, 2026 07:25
@doris-robot
Copy link
Copy Markdown

Cloud UT Coverage Report

Increment line coverage 9.68% (3/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.30% (1797/2266)
Line Coverage 64.83% (32066/49462)
Region Coverage 65.50% (16012/24447)
Branch Coverage 55.96% (8513/15212)

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.99% (20064/37867)
Line Coverage 36.54% (188291/515240)
Region Coverage 32.82% (146214/445472)
Branch Coverage 33.94% (63963/188451)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.57% (27286/37086)
Line Coverage 57.14% (293528/513696)
Region Coverage 54.21% (243731/449588)
Branch Coverage 56.02% (105890/189017)

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 1, 2026

run cloudut

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 1, 2026

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 9.68% (3/31) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.46% (1799/2293)
Line Coverage 64.18% (32322/50365)
Region Coverage 65.01% (16215/24942)
Branch Coverage 55.46% (8640/15580)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.57% (27283/37086)
Line Coverage 57.15% (293568/513696)
Region Coverage 54.24% (243841/449588)
Branch Coverage 56.04% (105923/189017)

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 1, 2026

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.57% (27285/37086)
Line Coverage 57.15% (293553/513696)
Region Coverage 54.26% (243944/449588)
Branch Coverage 56.05% (105946/189017)

liaoxin01
liaoxin01 previously approved these changes Apr 1, 2026
Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR approved by anyone and no changes requested.

@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch from 3e89cf3 to fb5dbff Compare May 6, 2026 10:48
@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch from fb5dbff to 5eedb99 Compare May 6, 2026 11:04
@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label May 6, 2026
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 6, 2026

run buildall

bobhan1 added 3 commits May 7, 2026 10:05
fix

[improvement](be) Hook dynamic MS throttle configs to update callbacks

Issue Number: None

Related PR: None

Problem Summary: Newly added BE configs for per-RPC MS QPS limits and MS backpressure throttle upgrade/downgrade only changed config values at runtime, but did not propagate those changes into the in-memory rate limiter and backpressure handler state. This commit registers DEFINE_ON_UPDATE callbacks for those configs and refreshes the corresponding runtime objects only when the new value differs from the old value.

None

- Test: No need to test (code change committed without rerunning build in this step)
- Behavior changed: Yes (runtime config updates now take effect on the corresponding in-memory MS throttling state)
- Does this need documentation: No

update

fix sync rowset retry and fix MSBackpressureHandler state transition is not atomic

fix wrong substitution

[fix](be) Log actual throttle ticks on transition

Issue Number: None

Related PR: None

Problem Summary: Capture the actual elapsed tick counters before resetting them so the ms-throttle upgrade and downgrade logs report real values instead of reset counters.

None

- Test: No need to test (log-only change; attempted targeted BE UT but sandbox blocked submodule update)

- Behavior changed: Yes (INFO logs now print the actual elapsed ticks for upgrade and downgrade triggers)

- Does this need documentation: No

[fix](be) Disable MS backpressure handling by default

Issue Number: None

Related PR: None

Problem Summary: Change the default value of enable_ms_backpressure_handling to false so MS backpressure response handling is opt-in instead of enabled by default.

MS backpressure handling is now disabled by default.

- Test: No need to test (single default-config change only)

- Behavior changed: Yes (enable_ms_backpressure_handling defaults to false)

- Does this need documentation: No

format

change enable_ms_rpc_host_level_rate_limit default to falase
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: ExecEnv forward-declared doris::cloud MS RPC limiter types, which exposed doris::cloud through common include paths and made older headers resolve global cloud protobuf types incorrectly.

### Release note

None

### Check List (For Author)

- Test: Manual test

    - ./build.sh --be -j100

- Behavior changed: No

- Does this need documentation: No
@bobhan1 bobhan1 force-pushed the be-ms-rpc-rate-limit branch from 2e825e8 to a79fdb1 Compare May 7, 2026 02:16
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented May 7, 2026

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

PR approved by at least one committer and no changes requested.

@gavinchou gavinchou merged commit eaa4ef9 into apache:master May 9, 2026
29 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants